{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 学员必读："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "注释：# 后内容为注释，不会被python解释器所解释，不影响程序运行。\n",
    "<br>注释为了方便自己和他人理解程序，我们可以在程序代码中添加一些关于程序功能、算法、函数、数据的说明文字，从而加强程序代码的可读性。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "注意：如果同学在使用服务器完成作业时遇到困难，请在社群中向班班或者助教老师寻求帮助呦！\n",
    "<br>Jupyter服务器使用流程请参考以下文档：\n",
    "<br>https://doc.weixin.qq.com/doc/w3_AF4AjAaeAAkU80GMBtLR7CS4Hm5uq?scode=AI4A8gcnABAFNLrTb0AF4AjAaeAAk"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 一.读取数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>RowID</th>\n",
       "      <th>OrderID</th>\n",
       "      <th>OrderDate</th>\n",
       "      <th>ShipDate</th>\n",
       "      <th>ShipMode</th>\n",
       "      <th>CustomerID</th>\n",
       "      <th>CustomerName</th>\n",
       "      <th>Segment</th>\n",
       "      <th>City</th>\n",
       "      <th>State</th>\n",
       "      <th>...</th>\n",
       "      <th>ProductID</th>\n",
       "      <th>Category</th>\n",
       "      <th>Sub-Category</th>\n",
       "      <th>ProductName</th>\n",
       "      <th>Sales</th>\n",
       "      <th>Quantity</th>\n",
       "      <th>Discount</th>\n",
       "      <th>Profit</th>\n",
       "      <th>ShippingCost</th>\n",
       "      <th>OrderPriority</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>IN-2011-47883</td>\n",
       "      <td>2011/1/1</td>\n",
       "      <td>2011/1/8</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>JH-15985</td>\n",
       "      <td>Joseph Holt</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Wagga Wagga</td>\n",
       "      <td>New South Wales</td>\n",
       "      <td>...</td>\n",
       "      <td>OFF-SU-10000618</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Supplies</td>\n",
       "      <td>Acme Trimmer, High Speed</td>\n",
       "      <td>120.366</td>\n",
       "      <td>3</td>\n",
       "      <td>0.1</td>\n",
       "      <td>36.036</td>\n",
       "      <td>9.72</td>\n",
       "      <td>Medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>IN-2011-47883</td>\n",
       "      <td>2011/1/1</td>\n",
       "      <td>2011/1/8</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>JH-15985</td>\n",
       "      <td>Joseph Holt</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Wagga Wagga</td>\n",
       "      <td>New South Wales</td>\n",
       "      <td>...</td>\n",
       "      <td>OFF-PA-10001968</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Paper</td>\n",
       "      <td>Eaton Computer Printout Paper, 8.5 x 11</td>\n",
       "      <td>55.242</td>\n",
       "      <td>2</td>\n",
       "      <td>0.1</td>\n",
       "      <td>15.342</td>\n",
       "      <td>1.80</td>\n",
       "      <td>Medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>IN-2011-47883</td>\n",
       "      <td>2011/1/1</td>\n",
       "      <td>2011/1/8</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>JH-15985</td>\n",
       "      <td>Joseph Holt</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Wagga Wagga</td>\n",
       "      <td>New South Wales</td>\n",
       "      <td>...</td>\n",
       "      <td>FUR-FU-10003447</td>\n",
       "      <td>Furniture</td>\n",
       "      <td>Furnishings</td>\n",
       "      <td>Eldon Light Bulb, Duo Pack</td>\n",
       "      <td>113.670</td>\n",
       "      <td>5</td>\n",
       "      <td>0.1</td>\n",
       "      <td>37.770</td>\n",
       "      <td>4.70</td>\n",
       "      <td>Medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>IT-2011-3647632</td>\n",
       "      <td>2011/1/1</td>\n",
       "      <td>2011/1/5</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>EM-14140</td>\n",
       "      <td>Eugene Moren</td>\n",
       "      <td>Home Office</td>\n",
       "      <td>Stockholm</td>\n",
       "      <td>Stockholm</td>\n",
       "      <td>...</td>\n",
       "      <td>OFF-PA-10001492</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Paper</td>\n",
       "      <td>Enermax Note Cards, Premium</td>\n",
       "      <td>44.865</td>\n",
       "      <td>3</td>\n",
       "      <td>0.5</td>\n",
       "      <td>-26.055</td>\n",
       "      <td>4.82</td>\n",
       "      <td>High</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>HU-2011-1220</td>\n",
       "      <td>2011/1/1</td>\n",
       "      <td>2011/1/5</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>AT-735</td>\n",
       "      <td>Annie Thurman</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Budapest</td>\n",
       "      <td>Budapest</td>\n",
       "      <td>...</td>\n",
       "      <td>OFF-TEN-10001585</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Storage</td>\n",
       "      <td>Tenex Box, Single Width</td>\n",
       "      <td>66.120</td>\n",
       "      <td>4</td>\n",
       "      <td>0.0</td>\n",
       "      <td>29.640</td>\n",
       "      <td>8.17</td>\n",
       "      <td>High</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51096</th>\n",
       "      <td>51094</td>\n",
       "      <td>IN-2014-75603</td>\n",
       "      <td>2014/12/31</td>\n",
       "      <td>2015/1/5</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>BS-11365</td>\n",
       "      <td>Bill Shonely</td>\n",
       "      <td>Corporate</td>\n",
       "      <td>Vijayawada</td>\n",
       "      <td>Andhra Pradesh</td>\n",
       "      <td>...</td>\n",
       "      <td>OFF-FA-10000263</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Fasteners</td>\n",
       "      <td>Stockwell Thumb Tacks, Bulk Pack</td>\n",
       "      <td>39.420</td>\n",
       "      <td>3</td>\n",
       "      <td>0.0</td>\n",
       "      <td>17.280</td>\n",
       "      <td>2.97</td>\n",
       "      <td>Medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51097</th>\n",
       "      <td>51095</td>\n",
       "      <td>TU-2014-5170</td>\n",
       "      <td>2014/12/31</td>\n",
       "      <td>2015/1/4</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>VD-11670</td>\n",
       "      <td>Valerie Dominguez</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Konya</td>\n",
       "      <td>Konya</td>\n",
       "      <td>...</td>\n",
       "      <td>FUR-TEN-10000558</td>\n",
       "      <td>Furniture</td>\n",
       "      <td>Furnishings</td>\n",
       "      <td>Tenex Frame, Erganomic</td>\n",
       "      <td>173.760</td>\n",
       "      <td>4</td>\n",
       "      <td>0.6</td>\n",
       "      <td>-117.360</td>\n",
       "      <td>13.72</td>\n",
       "      <td>Medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51098</th>\n",
       "      <td>51096</td>\n",
       "      <td>MO-2014-2560</td>\n",
       "      <td>2014/12/31</td>\n",
       "      <td>2015/1/5</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>LP-7095</td>\n",
       "      <td>Liz Preis</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Agadir</td>\n",
       "      <td>Souss-M</td>\n",
       "      <td>...</td>\n",
       "      <td>OFF-WIL-10001069</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Binders</td>\n",
       "      <td>Wilson Jones Hole Reinforcements, Clear</td>\n",
       "      <td>3.990</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.420</td>\n",
       "      <td>0.49</td>\n",
       "      <td>Medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51099</th>\n",
       "      <td>51097</td>\n",
       "      <td>ES-2014-4785777</td>\n",
       "      <td>2014/12/31</td>\n",
       "      <td>2015/1/4</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>DP-13390</td>\n",
       "      <td>Dennis Pardue</td>\n",
       "      <td>Home Office</td>\n",
       "      <td>Hamburg</td>\n",
       "      <td>Hamburg</td>\n",
       "      <td>...</td>\n",
       "      <td>OFF-BI-10000620</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Binders</td>\n",
       "      <td>Wilson Jones Index Tab, Economy</td>\n",
       "      <td>32.250</td>\n",
       "      <td>5</td>\n",
       "      <td>0.0</td>\n",
       "      <td>8.250</td>\n",
       "      <td>2.21</td>\n",
       "      <td>Medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51100</th>\n",
       "      <td>51098</td>\n",
       "      <td>CA-2014-143259</td>\n",
       "      <td>2014/12/31</td>\n",
       "      <td>2015/1/4</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>PO-18865</td>\n",
       "      <td>Patrick O'Donnell</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>New York City</td>\n",
       "      <td>New York</td>\n",
       "      <td>...</td>\n",
       "      <td>OFF-BI-10003684</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Binders</td>\n",
       "      <td>Wilson Jones Legal Size Ring Binders</td>\n",
       "      <td>52.776</td>\n",
       "      <td>3</td>\n",
       "      <td>0.2</td>\n",
       "      <td>19.791</td>\n",
       "      <td>7.21</td>\n",
       "      <td>High</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>51101 rows × 24 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       RowID          OrderID   OrderDate  ShipDate        ShipMode  \\\n",
       "0          1    IN-2011-47883    2011/1/1  2011/1/8  Standard Class   \n",
       "1          2    IN-2011-47883    2011/1/1  2011/1/8  Standard Class   \n",
       "2          3    IN-2011-47883    2011/1/1  2011/1/8  Standard Class   \n",
       "3          4  IT-2011-3647632    2011/1/1  2011/1/5    Second Class   \n",
       "4          5     HU-2011-1220    2011/1/1  2011/1/5    Second Class   \n",
       "...      ...              ...         ...       ...             ...   \n",
       "51096  51094    IN-2014-75603  2014/12/31  2015/1/5    Second Class   \n",
       "51097  51095     TU-2014-5170  2014/12/31  2015/1/4    Second Class   \n",
       "51098  51096     MO-2014-2560  2014/12/31  2015/1/5  Standard Class   \n",
       "51099  51097  ES-2014-4785777  2014/12/31  2015/1/4  Standard Class   \n",
       "51100  51098   CA-2014-143259  2014/12/31  2015/1/4  Standard Class   \n",
       "\n",
       "      CustomerID       CustomerName      Segment           City  \\\n",
       "0       JH-15985        Joseph Holt     Consumer    Wagga Wagga   \n",
       "1       JH-15985        Joseph Holt     Consumer    Wagga Wagga   \n",
       "2       JH-15985        Joseph Holt     Consumer    Wagga Wagga   \n",
       "3       EM-14140       Eugene Moren  Home Office      Stockholm   \n",
       "4         AT-735      Annie Thurman     Consumer       Budapest   \n",
       "...          ...                ...          ...            ...   \n",
       "51096   BS-11365       Bill Shonely    Corporate     Vijayawada   \n",
       "51097   VD-11670  Valerie Dominguez     Consumer          Konya   \n",
       "51098    LP-7095          Liz Preis     Consumer         Agadir   \n",
       "51099   DP-13390      Dennis Pardue  Home Office        Hamburg   \n",
       "51100   PO-18865  Patrick O'Donnell     Consumer  New York City   \n",
       "\n",
       "                 State  ...         ProductID         Category Sub-Category  \\\n",
       "0      New South Wales  ...   OFF-SU-10000618  Office Supplies     Supplies   \n",
       "1      New South Wales  ...   OFF-PA-10001968  Office Supplies        Paper   \n",
       "2      New South Wales  ...   FUR-FU-10003447        Furniture  Furnishings   \n",
       "3            Stockholm  ...   OFF-PA-10001492  Office Supplies        Paper   \n",
       "4             Budapest  ...  OFF-TEN-10001585  Office Supplies      Storage   \n",
       "...                ...  ...               ...              ...          ...   \n",
       "51096   Andhra Pradesh  ...   OFF-FA-10000263  Office Supplies    Fasteners   \n",
       "51097            Konya  ...  FUR-TEN-10000558        Furniture  Furnishings   \n",
       "51098          Souss-M  ...  OFF-WIL-10001069  Office Supplies      Binders   \n",
       "51099          Hamburg  ...   OFF-BI-10000620  Office Supplies      Binders   \n",
       "51100         New York  ...   OFF-BI-10003684  Office Supplies      Binders   \n",
       "\n",
       "                                   ProductName    Sales Quantity Discount  \\\n",
       "0                     Acme Trimmer, High Speed  120.366        3      0.1   \n",
       "1      Eaton Computer Printout Paper, 8.5 x 11   55.242        2      0.1   \n",
       "2                   Eldon Light Bulb, Duo Pack  113.670        5      0.1   \n",
       "3                  Enermax Note Cards, Premium   44.865        3      0.5   \n",
       "4                      Tenex Box, Single Width   66.120        4      0.0   \n",
       "...                                        ...      ...      ...      ...   \n",
       "51096         Stockwell Thumb Tacks, Bulk Pack   39.420        3      0.0   \n",
       "51097                   Tenex Frame, Erganomic  173.760        4      0.6   \n",
       "51098  Wilson Jones Hole Reinforcements, Clear    3.990        1      0.0   \n",
       "51099          Wilson Jones Index Tab, Economy   32.250        5      0.0   \n",
       "51100     Wilson Jones Legal Size Ring Binders   52.776        3      0.2   \n",
       "\n",
       "        Profit  ShippingCost  OrderPriority  \n",
       "0       36.036          9.72         Medium  \n",
       "1       15.342          1.80         Medium  \n",
       "2       37.770          4.70         Medium  \n",
       "3      -26.055          4.82           High  \n",
       "4       29.640          8.17           High  \n",
       "...        ...           ...            ...  \n",
       "51096   17.280          2.97         Medium  \n",
       "51097 -117.360         13.72         Medium  \n",
       "51098    0.420          0.49         Medium  \n",
       "51099    8.250          2.21         Medium  \n",
       "51100   19.791          7.21           High  \n",
       "\n",
       "[51101 rows x 24 columns]"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd \n",
    "data = pd.read_csv('dataset.csv',encoding='ISO-8859-1') \n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 二.数据清洗"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#1.确定数据范围\n",
    "#找到业务数据符合业务规则 \n",
    "\n",
    "# 2.清洗数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.根据业务需要提取数据,发货日期早于下单日期 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "#1)转换字段【ShipDate】的时间类型\n",
    "data['ShipDate'] = pd.to_datetime(data['ShipDate'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#根据以上操作转换字段【OrderDate】的时间类型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#请输入代码\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "#参考答案\n",
    "\n",
    "data['OrderDate'] = pd.to_datetime(data['OrderDate'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "#2)计算【ShipDate】和【OrderDate】之间的时间差  \n",
    "#data['interval'] = (data['***']-data['***']).dt.total_seconds()\n",
    "#data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#请输入代码\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>RowID</th>\n",
       "      <th>OrderID</th>\n",
       "      <th>OrderDate</th>\n",
       "      <th>ShipDate</th>\n",
       "      <th>ShipMode</th>\n",
       "      <th>CustomerID</th>\n",
       "      <th>CustomerName</th>\n",
       "      <th>Segment</th>\n",
       "      <th>City</th>\n",
       "      <th>State</th>\n",
       "      <th>...</th>\n",
       "      <th>Category</th>\n",
       "      <th>Sub-Category</th>\n",
       "      <th>ProductName</th>\n",
       "      <th>Sales</th>\n",
       "      <th>Quantity</th>\n",
       "      <th>Discount</th>\n",
       "      <th>Profit</th>\n",
       "      <th>ShippingCost</th>\n",
       "      <th>OrderPriority</th>\n",
       "      <th>interval</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>IN-2011-47883</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-08</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>JH-15985</td>\n",
       "      <td>Joseph Holt</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Wagga Wagga</td>\n",
       "      <td>New South Wales</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Supplies</td>\n",
       "      <td>Acme Trimmer, High Speed</td>\n",
       "      <td>120.366</td>\n",
       "      <td>3</td>\n",
       "      <td>0.1</td>\n",
       "      <td>36.036</td>\n",
       "      <td>9.72</td>\n",
       "      <td>Medium</td>\n",
       "      <td>604800.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>IN-2011-47883</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-08</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>JH-15985</td>\n",
       "      <td>Joseph Holt</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Wagga Wagga</td>\n",
       "      <td>New South Wales</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Paper</td>\n",
       "      <td>Eaton Computer Printout Paper, 8.5 x 11</td>\n",
       "      <td>55.242</td>\n",
       "      <td>2</td>\n",
       "      <td>0.1</td>\n",
       "      <td>15.342</td>\n",
       "      <td>1.80</td>\n",
       "      <td>Medium</td>\n",
       "      <td>604800.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>IN-2011-47883</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-08</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>JH-15985</td>\n",
       "      <td>Joseph Holt</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Wagga Wagga</td>\n",
       "      <td>New South Wales</td>\n",
       "      <td>...</td>\n",
       "      <td>Furniture</td>\n",
       "      <td>Furnishings</td>\n",
       "      <td>Eldon Light Bulb, Duo Pack</td>\n",
       "      <td>113.670</td>\n",
       "      <td>5</td>\n",
       "      <td>0.1</td>\n",
       "      <td>37.770</td>\n",
       "      <td>4.70</td>\n",
       "      <td>Medium</td>\n",
       "      <td>604800.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>IT-2011-3647632</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-05</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>EM-14140</td>\n",
       "      <td>Eugene Moren</td>\n",
       "      <td>Home Office</td>\n",
       "      <td>Stockholm</td>\n",
       "      <td>Stockholm</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Paper</td>\n",
       "      <td>Enermax Note Cards, Premium</td>\n",
       "      <td>44.865</td>\n",
       "      <td>3</td>\n",
       "      <td>0.5</td>\n",
       "      <td>-26.055</td>\n",
       "      <td>4.82</td>\n",
       "      <td>High</td>\n",
       "      <td>345600.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>HU-2011-1220</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-05</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>AT-735</td>\n",
       "      <td>Annie Thurman</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Budapest</td>\n",
       "      <td>Budapest</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Storage</td>\n",
       "      <td>Tenex Box, Single Width</td>\n",
       "      <td>66.120</td>\n",
       "      <td>4</td>\n",
       "      <td>0.0</td>\n",
       "      <td>29.640</td>\n",
       "      <td>8.17</td>\n",
       "      <td>High</td>\n",
       "      <td>345600.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51096</th>\n",
       "      <td>51094</td>\n",
       "      <td>IN-2014-75603</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-05</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>BS-11365</td>\n",
       "      <td>Bill Shonely</td>\n",
       "      <td>Corporate</td>\n",
       "      <td>Vijayawada</td>\n",
       "      <td>Andhra Pradesh</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Fasteners</td>\n",
       "      <td>Stockwell Thumb Tacks, Bulk Pack</td>\n",
       "      <td>39.420</td>\n",
       "      <td>3</td>\n",
       "      <td>0.0</td>\n",
       "      <td>17.280</td>\n",
       "      <td>2.97</td>\n",
       "      <td>Medium</td>\n",
       "      <td>432000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51097</th>\n",
       "      <td>51095</td>\n",
       "      <td>TU-2014-5170</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-04</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>VD-11670</td>\n",
       "      <td>Valerie Dominguez</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Konya</td>\n",
       "      <td>Konya</td>\n",
       "      <td>...</td>\n",
       "      <td>Furniture</td>\n",
       "      <td>Furnishings</td>\n",
       "      <td>Tenex Frame, Erganomic</td>\n",
       "      <td>173.760</td>\n",
       "      <td>4</td>\n",
       "      <td>0.6</td>\n",
       "      <td>-117.360</td>\n",
       "      <td>13.72</td>\n",
       "      <td>Medium</td>\n",
       "      <td>345600.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51098</th>\n",
       "      <td>51096</td>\n",
       "      <td>MO-2014-2560</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-05</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>LP-7095</td>\n",
       "      <td>Liz Preis</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Agadir</td>\n",
       "      <td>Souss-M</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Binders</td>\n",
       "      <td>Wilson Jones Hole Reinforcements, Clear</td>\n",
       "      <td>3.990</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.420</td>\n",
       "      <td>0.49</td>\n",
       "      <td>Medium</td>\n",
       "      <td>432000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51099</th>\n",
       "      <td>51097</td>\n",
       "      <td>ES-2014-4785777</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-04</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>DP-13390</td>\n",
       "      <td>Dennis Pardue</td>\n",
       "      <td>Home Office</td>\n",
       "      <td>Hamburg</td>\n",
       "      <td>Hamburg</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Binders</td>\n",
       "      <td>Wilson Jones Index Tab, Economy</td>\n",
       "      <td>32.250</td>\n",
       "      <td>5</td>\n",
       "      <td>0.0</td>\n",
       "      <td>8.250</td>\n",
       "      <td>2.21</td>\n",
       "      <td>Medium</td>\n",
       "      <td>345600.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51100</th>\n",
       "      <td>51098</td>\n",
       "      <td>CA-2014-143259</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-04</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>PO-18865</td>\n",
       "      <td>Patrick O'Donnell</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>New York City</td>\n",
       "      <td>New York</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Binders</td>\n",
       "      <td>Wilson Jones Legal Size Ring Binders</td>\n",
       "      <td>52.776</td>\n",
       "      <td>3</td>\n",
       "      <td>0.2</td>\n",
       "      <td>19.791</td>\n",
       "      <td>7.21</td>\n",
       "      <td>High</td>\n",
       "      <td>345600.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>51101 rows × 25 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       RowID          OrderID  OrderDate   ShipDate        ShipMode  \\\n",
       "0          1    IN-2011-47883 2011-01-01 2011-01-08  Standard Class   \n",
       "1          2    IN-2011-47883 2011-01-01 2011-01-08  Standard Class   \n",
       "2          3    IN-2011-47883 2011-01-01 2011-01-08  Standard Class   \n",
       "3          4  IT-2011-3647632 2011-01-01 2011-01-05    Second Class   \n",
       "4          5     HU-2011-1220 2011-01-01 2011-01-05    Second Class   \n",
       "...      ...              ...        ...        ...             ...   \n",
       "51096  51094    IN-2014-75603 2014-12-31 2015-01-05    Second Class   \n",
       "51097  51095     TU-2014-5170 2014-12-31 2015-01-04    Second Class   \n",
       "51098  51096     MO-2014-2560 2014-12-31 2015-01-05  Standard Class   \n",
       "51099  51097  ES-2014-4785777 2014-12-31 2015-01-04  Standard Class   \n",
       "51100  51098   CA-2014-143259 2014-12-31 2015-01-04  Standard Class   \n",
       "\n",
       "      CustomerID       CustomerName      Segment           City  \\\n",
       "0       JH-15985        Joseph Holt     Consumer    Wagga Wagga   \n",
       "1       JH-15985        Joseph Holt     Consumer    Wagga Wagga   \n",
       "2       JH-15985        Joseph Holt     Consumer    Wagga Wagga   \n",
       "3       EM-14140       Eugene Moren  Home Office      Stockholm   \n",
       "4         AT-735      Annie Thurman     Consumer       Budapest   \n",
       "...          ...                ...          ...            ...   \n",
       "51096   BS-11365       Bill Shonely    Corporate     Vijayawada   \n",
       "51097   VD-11670  Valerie Dominguez     Consumer          Konya   \n",
       "51098    LP-7095          Liz Preis     Consumer         Agadir   \n",
       "51099   DP-13390      Dennis Pardue  Home Office        Hamburg   \n",
       "51100   PO-18865  Patrick O'Donnell     Consumer  New York City   \n",
       "\n",
       "                 State  ...         Category  Sub-Category  \\\n",
       "0      New South Wales  ...  Office Supplies      Supplies   \n",
       "1      New South Wales  ...  Office Supplies         Paper   \n",
       "2      New South Wales  ...        Furniture   Furnishings   \n",
       "3            Stockholm  ...  Office Supplies         Paper   \n",
       "4             Budapest  ...  Office Supplies       Storage   \n",
       "...                ...  ...              ...           ...   \n",
       "51096   Andhra Pradesh  ...  Office Supplies     Fasteners   \n",
       "51097            Konya  ...        Furniture   Furnishings   \n",
       "51098          Souss-M  ...  Office Supplies       Binders   \n",
       "51099          Hamburg  ...  Office Supplies       Binders   \n",
       "51100         New York  ...  Office Supplies       Binders   \n",
       "\n",
       "                                   ProductName    Sales Quantity Discount  \\\n",
       "0                     Acme Trimmer, High Speed  120.366        3      0.1   \n",
       "1      Eaton Computer Printout Paper, 8.5 x 11   55.242        2      0.1   \n",
       "2                   Eldon Light Bulb, Duo Pack  113.670        5      0.1   \n",
       "3                  Enermax Note Cards, Premium   44.865        3      0.5   \n",
       "4                      Tenex Box, Single Width   66.120        4      0.0   \n",
       "...                                        ...      ...      ...      ...   \n",
       "51096         Stockwell Thumb Tacks, Bulk Pack   39.420        3      0.0   \n",
       "51097                   Tenex Frame, Erganomic  173.760        4      0.6   \n",
       "51098  Wilson Jones Hole Reinforcements, Clear    3.990        1      0.0   \n",
       "51099          Wilson Jones Index Tab, Economy   32.250        5      0.0   \n",
       "51100     Wilson Jones Legal Size Ring Binders   52.776        3      0.2   \n",
       "\n",
       "        Profit ShippingCost  OrderPriority  interval  \n",
       "0       36.036         9.72         Medium  604800.0  \n",
       "1       15.342         1.80         Medium  604800.0  \n",
       "2       37.770         4.70         Medium  604800.0  \n",
       "3      -26.055         4.82           High  345600.0  \n",
       "4       29.640         8.17           High  345600.0  \n",
       "...        ...          ...            ...       ...  \n",
       "51096   17.280         2.97         Medium  432000.0  \n",
       "51097 -117.360        13.72         Medium  345600.0  \n",
       "51098    0.420         0.49         Medium  432000.0  \n",
       "51099    8.250         2.21         Medium  345600.0  \n",
       "51100   19.791         7.21           High  345600.0  \n",
       "\n",
       "[51101 rows x 25 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#参考答案\n",
    "\n",
    "data['interval'] = (data['ShipDate']-data['OrderDate']).dt.total_seconds()\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#3)找时间差，剔除不符合业务常识的数据，即ShipDate早于OrderDate\n",
    "# data.drop(index=data[data.interval<***].index,inplace=True) \n",
    "#data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#请输入代码\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>RowID</th>\n",
       "      <th>OrderID</th>\n",
       "      <th>OrderDate</th>\n",
       "      <th>ShipDate</th>\n",
       "      <th>ShipMode</th>\n",
       "      <th>CustomerID</th>\n",
       "      <th>CustomerName</th>\n",
       "      <th>Segment</th>\n",
       "      <th>City</th>\n",
       "      <th>State</th>\n",
       "      <th>...</th>\n",
       "      <th>Category</th>\n",
       "      <th>Sub-Category</th>\n",
       "      <th>ProductName</th>\n",
       "      <th>Sales</th>\n",
       "      <th>Quantity</th>\n",
       "      <th>Discount</th>\n",
       "      <th>Profit</th>\n",
       "      <th>ShippingCost</th>\n",
       "      <th>OrderPriority</th>\n",
       "      <th>interval</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>IN-2011-47883</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-08</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>JH-15985</td>\n",
       "      <td>Joseph Holt</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Wagga Wagga</td>\n",
       "      <td>New South Wales</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Supplies</td>\n",
       "      <td>Acme Trimmer, High Speed</td>\n",
       "      <td>120.366</td>\n",
       "      <td>3</td>\n",
       "      <td>0.1</td>\n",
       "      <td>36.036</td>\n",
       "      <td>9.72</td>\n",
       "      <td>Medium</td>\n",
       "      <td>604800.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>IN-2011-47883</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-08</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>JH-15985</td>\n",
       "      <td>Joseph Holt</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Wagga Wagga</td>\n",
       "      <td>New South Wales</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Paper</td>\n",
       "      <td>Eaton Computer Printout Paper, 8.5 x 11</td>\n",
       "      <td>55.242</td>\n",
       "      <td>2</td>\n",
       "      <td>0.1</td>\n",
       "      <td>15.342</td>\n",
       "      <td>1.80</td>\n",
       "      <td>Medium</td>\n",
       "      <td>604800.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>IN-2011-47883</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-08</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>JH-15985</td>\n",
       "      <td>Joseph Holt</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Wagga Wagga</td>\n",
       "      <td>New South Wales</td>\n",
       "      <td>...</td>\n",
       "      <td>Furniture</td>\n",
       "      <td>Furnishings</td>\n",
       "      <td>Eldon Light Bulb, Duo Pack</td>\n",
       "      <td>113.670</td>\n",
       "      <td>5</td>\n",
       "      <td>0.1</td>\n",
       "      <td>37.770</td>\n",
       "      <td>4.70</td>\n",
       "      <td>Medium</td>\n",
       "      <td>604800.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>IT-2011-3647632</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-05</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>EM-14140</td>\n",
       "      <td>Eugene Moren</td>\n",
       "      <td>Home Office</td>\n",
       "      <td>Stockholm</td>\n",
       "      <td>Stockholm</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Paper</td>\n",
       "      <td>Enermax Note Cards, Premium</td>\n",
       "      <td>44.865</td>\n",
       "      <td>3</td>\n",
       "      <td>0.5</td>\n",
       "      <td>-26.055</td>\n",
       "      <td>4.82</td>\n",
       "      <td>High</td>\n",
       "      <td>345600.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>HU-2011-1220</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-05</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>AT-735</td>\n",
       "      <td>Annie Thurman</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Budapest</td>\n",
       "      <td>Budapest</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Storage</td>\n",
       "      <td>Tenex Box, Single Width</td>\n",
       "      <td>66.120</td>\n",
       "      <td>4</td>\n",
       "      <td>0.0</td>\n",
       "      <td>29.640</td>\n",
       "      <td>8.17</td>\n",
       "      <td>High</td>\n",
       "      <td>345600.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51096</th>\n",
       "      <td>51094</td>\n",
       "      <td>IN-2014-75603</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-05</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>BS-11365</td>\n",
       "      <td>Bill Shonely</td>\n",
       "      <td>Corporate</td>\n",
       "      <td>Vijayawada</td>\n",
       "      <td>Andhra Pradesh</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Fasteners</td>\n",
       "      <td>Stockwell Thumb Tacks, Bulk Pack</td>\n",
       "      <td>39.420</td>\n",
       "      <td>3</td>\n",
       "      <td>0.0</td>\n",
       "      <td>17.280</td>\n",
       "      <td>2.97</td>\n",
       "      <td>Medium</td>\n",
       "      <td>432000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51097</th>\n",
       "      <td>51095</td>\n",
       "      <td>TU-2014-5170</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-04</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>VD-11670</td>\n",
       "      <td>Valerie Dominguez</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Konya</td>\n",
       "      <td>Konya</td>\n",
       "      <td>...</td>\n",
       "      <td>Furniture</td>\n",
       "      <td>Furnishings</td>\n",
       "      <td>Tenex Frame, Erganomic</td>\n",
       "      <td>173.760</td>\n",
       "      <td>4</td>\n",
       "      <td>0.6</td>\n",
       "      <td>-117.360</td>\n",
       "      <td>13.72</td>\n",
       "      <td>Medium</td>\n",
       "      <td>345600.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51098</th>\n",
       "      <td>51096</td>\n",
       "      <td>MO-2014-2560</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-05</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>LP-7095</td>\n",
       "      <td>Liz Preis</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Agadir</td>\n",
       "      <td>Souss-M</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Binders</td>\n",
       "      <td>Wilson Jones Hole Reinforcements, Clear</td>\n",
       "      <td>3.990</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.420</td>\n",
       "      <td>0.49</td>\n",
       "      <td>Medium</td>\n",
       "      <td>432000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51099</th>\n",
       "      <td>51097</td>\n",
       "      <td>ES-2014-4785777</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-04</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>DP-13390</td>\n",
       "      <td>Dennis Pardue</td>\n",
       "      <td>Home Office</td>\n",
       "      <td>Hamburg</td>\n",
       "      <td>Hamburg</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Binders</td>\n",
       "      <td>Wilson Jones Index Tab, Economy</td>\n",
       "      <td>32.250</td>\n",
       "      <td>5</td>\n",
       "      <td>0.0</td>\n",
       "      <td>8.250</td>\n",
       "      <td>2.21</td>\n",
       "      <td>Medium</td>\n",
       "      <td>345600.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51100</th>\n",
       "      <td>51098</td>\n",
       "      <td>CA-2014-143259</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-04</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>PO-18865</td>\n",
       "      <td>Patrick O'Donnell</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>New York City</td>\n",
       "      <td>New York</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Binders</td>\n",
       "      <td>Wilson Jones Legal Size Ring Binders</td>\n",
       "      <td>52.776</td>\n",
       "      <td>3</td>\n",
       "      <td>0.2</td>\n",
       "      <td>19.791</td>\n",
       "      <td>7.21</td>\n",
       "      <td>High</td>\n",
       "      <td>345600.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>51097 rows × 25 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       RowID          OrderID  OrderDate   ShipDate        ShipMode  \\\n",
       "0          1    IN-2011-47883 2011-01-01 2011-01-08  Standard Class   \n",
       "1          2    IN-2011-47883 2011-01-01 2011-01-08  Standard Class   \n",
       "2          3    IN-2011-47883 2011-01-01 2011-01-08  Standard Class   \n",
       "3          4  IT-2011-3647632 2011-01-01 2011-01-05    Second Class   \n",
       "4          5     HU-2011-1220 2011-01-01 2011-01-05    Second Class   \n",
       "...      ...              ...        ...        ...             ...   \n",
       "51096  51094    IN-2014-75603 2014-12-31 2015-01-05    Second Class   \n",
       "51097  51095     TU-2014-5170 2014-12-31 2015-01-04    Second Class   \n",
       "51098  51096     MO-2014-2560 2014-12-31 2015-01-05  Standard Class   \n",
       "51099  51097  ES-2014-4785777 2014-12-31 2015-01-04  Standard Class   \n",
       "51100  51098   CA-2014-143259 2014-12-31 2015-01-04  Standard Class   \n",
       "\n",
       "      CustomerID       CustomerName      Segment           City  \\\n",
       "0       JH-15985        Joseph Holt     Consumer    Wagga Wagga   \n",
       "1       JH-15985        Joseph Holt     Consumer    Wagga Wagga   \n",
       "2       JH-15985        Joseph Holt     Consumer    Wagga Wagga   \n",
       "3       EM-14140       Eugene Moren  Home Office      Stockholm   \n",
       "4         AT-735      Annie Thurman     Consumer       Budapest   \n",
       "...          ...                ...          ...            ...   \n",
       "51096   BS-11365       Bill Shonely    Corporate     Vijayawada   \n",
       "51097   VD-11670  Valerie Dominguez     Consumer          Konya   \n",
       "51098    LP-7095          Liz Preis     Consumer         Agadir   \n",
       "51099   DP-13390      Dennis Pardue  Home Office        Hamburg   \n",
       "51100   PO-18865  Patrick O'Donnell     Consumer  New York City   \n",
       "\n",
       "                 State  ...         Category  Sub-Category  \\\n",
       "0      New South Wales  ...  Office Supplies      Supplies   \n",
       "1      New South Wales  ...  Office Supplies         Paper   \n",
       "2      New South Wales  ...        Furniture   Furnishings   \n",
       "3            Stockholm  ...  Office Supplies         Paper   \n",
       "4             Budapest  ...  Office Supplies       Storage   \n",
       "...                ...  ...              ...           ...   \n",
       "51096   Andhra Pradesh  ...  Office Supplies     Fasteners   \n",
       "51097            Konya  ...        Furniture   Furnishings   \n",
       "51098          Souss-M  ...  Office Supplies       Binders   \n",
       "51099          Hamburg  ...  Office Supplies       Binders   \n",
       "51100         New York  ...  Office Supplies       Binders   \n",
       "\n",
       "                                   ProductName    Sales Quantity Discount  \\\n",
       "0                     Acme Trimmer, High Speed  120.366        3      0.1   \n",
       "1      Eaton Computer Printout Paper, 8.5 x 11   55.242        2      0.1   \n",
       "2                   Eldon Light Bulb, Duo Pack  113.670        5      0.1   \n",
       "3                  Enermax Note Cards, Premium   44.865        3      0.5   \n",
       "4                      Tenex Box, Single Width   66.120        4      0.0   \n",
       "...                                        ...      ...      ...      ...   \n",
       "51096         Stockwell Thumb Tacks, Bulk Pack   39.420        3      0.0   \n",
       "51097                   Tenex Frame, Erganomic  173.760        4      0.6   \n",
       "51098  Wilson Jones Hole Reinforcements, Clear    3.990        1      0.0   \n",
       "51099          Wilson Jones Index Tab, Economy   32.250        5      0.0   \n",
       "51100     Wilson Jones Legal Size Ring Binders   52.776        3      0.2   \n",
       "\n",
       "        Profit ShippingCost  OrderPriority  interval  \n",
       "0       36.036         9.72         Medium  604800.0  \n",
       "1       15.342         1.80         Medium  604800.0  \n",
       "2       37.770         4.70         Medium  604800.0  \n",
       "3      -26.055         4.82           High  345600.0  \n",
       "4       29.640         8.17           High  345600.0  \n",
       "...        ...          ...            ...       ...  \n",
       "51096   17.280         2.97         Medium  432000.0  \n",
       "51097 -117.360        13.72         Medium  345600.0  \n",
       "51098    0.420         0.49         Medium  432000.0  \n",
       "51099    8.250         2.21         Medium  345600.0  \n",
       "51100   19.791         7.21           High  345600.0  \n",
       "\n",
       "[51097 rows x 25 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#参考答案\n",
    "\n",
    "data.drop(index=data[data.interval<0].index,inplace=True) \n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.找出售价为负的数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#售价为负 \n",
    "#data[data.Sales***]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#请输入代码\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>RowID</th>\n",
       "      <th>OrderID</th>\n",
       "      <th>OrderDate</th>\n",
       "      <th>ShipDate</th>\n",
       "      <th>ShipMode</th>\n",
       "      <th>CustomerID</th>\n",
       "      <th>CustomerName</th>\n",
       "      <th>Segment</th>\n",
       "      <th>City</th>\n",
       "      <th>State</th>\n",
       "      <th>...</th>\n",
       "      <th>Category</th>\n",
       "      <th>Sub-Category</th>\n",
       "      <th>ProductName</th>\n",
       "      <th>Sales</th>\n",
       "      <th>Quantity</th>\n",
       "      <th>Discount</th>\n",
       "      <th>Profit</th>\n",
       "      <th>ShippingCost</th>\n",
       "      <th>OrderPriority</th>\n",
       "      <th>interval</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>0 rows × 25 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: [RowID, OrderID, OrderDate, ShipDate, ShipMode, CustomerID, CustomerName, Segment, City, State, Country, PostalCode, Market, Region, ProductID, Category, Sub-Category, ProductName, Sales, Quantity, Discount, Profit, ShippingCost, OrderPriority, interval]\n",
       "Index: []\n",
       "\n",
       "[0 rows x 25 columns]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#参考答案\n",
    "\n",
    "data[data.Sales<0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.查看数据 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Int64Index: 51097 entries, 0 to 51100\n",
      "Data columns (total 25 columns):\n",
      " #   Column         Non-Null Count  Dtype         \n",
      "---  ------         --------------  -----         \n",
      " 0   RowID          51097 non-null  int64         \n",
      " 1   OrderID        51097 non-null  object        \n",
      " 2   OrderDate      51097 non-null  datetime64[ns]\n",
      " 3   ShipDate       51097 non-null  datetime64[ns]\n",
      " 4   ShipMode       51086 non-null  object        \n",
      " 5   CustomerID     51097 non-null  object        \n",
      " 6   CustomerName   51097 non-null  object        \n",
      " 7   Segment        51097 non-null  object        \n",
      " 8   City           51097 non-null  object        \n",
      " 9   State          51097 non-null  object        \n",
      " 10  Country        51097 non-null  object        \n",
      " 11  PostalCode     9962 non-null   float64       \n",
      " 12  Market         51097 non-null  object        \n",
      " 13  Region         51097 non-null  object        \n",
      " 14  ProductID      51097 non-null  object        \n",
      " 15  Category       51097 non-null  object        \n",
      " 16  Sub-Category   51097 non-null  object        \n",
      " 17  ProductName    51097 non-null  object        \n",
      " 18  Sales          51097 non-null  float64       \n",
      " 19  Quantity       51097 non-null  int64         \n",
      " 20  Discount       51097 non-null  float64       \n",
      " 21  Profit         51097 non-null  float64       \n",
      " 22  ShippingCost   51097 non-null  float64       \n",
      " 23  OrderPriority  51097 non-null  object        \n",
      " 24  interval       51097 non-null  float64       \n",
      "dtypes: datetime64[ns](2), float64(6), int64(2), object(15)\n",
      "memory usage: 10.1+ MB\n"
     ]
    }
   ],
   "source": [
    "data.info() "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.数据清洗"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#脏数据  空值\\异常值\\重复值 \n",
    "#手段    弥补(字符串:众数;数字类型:平均值 ) , 删除(drop(col=[]))  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>RowID</th>\n",
       "      <th>OrderID</th>\n",
       "      <th>OrderDate</th>\n",
       "      <th>ShipDate</th>\n",
       "      <th>ShipMode</th>\n",
       "      <th>CustomerID</th>\n",
       "      <th>CustomerName</th>\n",
       "      <th>Segment</th>\n",
       "      <th>City</th>\n",
       "      <th>State</th>\n",
       "      <th>...</th>\n",
       "      <th>Category</th>\n",
       "      <th>Sub-Category</th>\n",
       "      <th>ProductName</th>\n",
       "      <th>Sales</th>\n",
       "      <th>Quantity</th>\n",
       "      <th>Discount</th>\n",
       "      <th>Profit</th>\n",
       "      <th>ShippingCost</th>\n",
       "      <th>OrderPriority</th>\n",
       "      <th>interval</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>IN-2011-47883</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-08</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>JH-15985</td>\n",
       "      <td>Joseph Holt</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Wagga Wagga</td>\n",
       "      <td>New South Wales</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Supplies</td>\n",
       "      <td>Acme Trimmer, High Speed</td>\n",
       "      <td>120.366</td>\n",
       "      <td>3</td>\n",
       "      <td>0.1</td>\n",
       "      <td>36.036</td>\n",
       "      <td>9.72</td>\n",
       "      <td>Medium</td>\n",
       "      <td>604800.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>IN-2011-47883</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-08</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>JH-15985</td>\n",
       "      <td>Joseph Holt</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Wagga Wagga</td>\n",
       "      <td>New South Wales</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Paper</td>\n",
       "      <td>Eaton Computer Printout Paper, 8.5 x 11</td>\n",
       "      <td>55.242</td>\n",
       "      <td>2</td>\n",
       "      <td>0.1</td>\n",
       "      <td>15.342</td>\n",
       "      <td>1.80</td>\n",
       "      <td>Medium</td>\n",
       "      <td>604800.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>IN-2011-47883</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-08</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>JH-15985</td>\n",
       "      <td>Joseph Holt</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Wagga Wagga</td>\n",
       "      <td>New South Wales</td>\n",
       "      <td>...</td>\n",
       "      <td>Furniture</td>\n",
       "      <td>Furnishings</td>\n",
       "      <td>Eldon Light Bulb, Duo Pack</td>\n",
       "      <td>113.670</td>\n",
       "      <td>5</td>\n",
       "      <td>0.1</td>\n",
       "      <td>37.770</td>\n",
       "      <td>4.70</td>\n",
       "      <td>Medium</td>\n",
       "      <td>604800.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>IT-2011-3647632</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-05</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>EM-14140</td>\n",
       "      <td>Eugene Moren</td>\n",
       "      <td>Home Office</td>\n",
       "      <td>Stockholm</td>\n",
       "      <td>Stockholm</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Paper</td>\n",
       "      <td>Enermax Note Cards, Premium</td>\n",
       "      <td>44.865</td>\n",
       "      <td>3</td>\n",
       "      <td>0.5</td>\n",
       "      <td>-26.055</td>\n",
       "      <td>4.82</td>\n",
       "      <td>High</td>\n",
       "      <td>345600.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>HU-2011-1220</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-05</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>AT-735</td>\n",
       "      <td>Annie Thurman</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Budapest</td>\n",
       "      <td>Budapest</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Storage</td>\n",
       "      <td>Tenex Box, Single Width</td>\n",
       "      <td>66.120</td>\n",
       "      <td>4</td>\n",
       "      <td>0.0</td>\n",
       "      <td>29.640</td>\n",
       "      <td>8.17</td>\n",
       "      <td>High</td>\n",
       "      <td>345600.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51096</th>\n",
       "      <td>51094</td>\n",
       "      <td>IN-2014-75603</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-05</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>BS-11365</td>\n",
       "      <td>Bill Shonely</td>\n",
       "      <td>Corporate</td>\n",
       "      <td>Vijayawada</td>\n",
       "      <td>Andhra Pradesh</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Fasteners</td>\n",
       "      <td>Stockwell Thumb Tacks, Bulk Pack</td>\n",
       "      <td>39.420</td>\n",
       "      <td>3</td>\n",
       "      <td>0.0</td>\n",
       "      <td>17.280</td>\n",
       "      <td>2.97</td>\n",
       "      <td>Medium</td>\n",
       "      <td>432000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51097</th>\n",
       "      <td>51095</td>\n",
       "      <td>TU-2014-5170</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-04</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>VD-11670</td>\n",
       "      <td>Valerie Dominguez</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Konya</td>\n",
       "      <td>Konya</td>\n",
       "      <td>...</td>\n",
       "      <td>Furniture</td>\n",
       "      <td>Furnishings</td>\n",
       "      <td>Tenex Frame, Erganomic</td>\n",
       "      <td>173.760</td>\n",
       "      <td>4</td>\n",
       "      <td>0.6</td>\n",
       "      <td>-117.360</td>\n",
       "      <td>13.72</td>\n",
       "      <td>Medium</td>\n",
       "      <td>345600.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51098</th>\n",
       "      <td>51096</td>\n",
       "      <td>MO-2014-2560</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-05</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>LP-7095</td>\n",
       "      <td>Liz Preis</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Agadir</td>\n",
       "      <td>Souss-M</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Binders</td>\n",
       "      <td>Wilson Jones Hole Reinforcements, Clear</td>\n",
       "      <td>3.990</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.420</td>\n",
       "      <td>0.49</td>\n",
       "      <td>Medium</td>\n",
       "      <td>432000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51099</th>\n",
       "      <td>51097</td>\n",
       "      <td>ES-2014-4785777</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-04</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>DP-13390</td>\n",
       "      <td>Dennis Pardue</td>\n",
       "      <td>Home Office</td>\n",
       "      <td>Hamburg</td>\n",
       "      <td>Hamburg</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Binders</td>\n",
       "      <td>Wilson Jones Index Tab, Economy</td>\n",
       "      <td>32.250</td>\n",
       "      <td>5</td>\n",
       "      <td>0.0</td>\n",
       "      <td>8.250</td>\n",
       "      <td>2.21</td>\n",
       "      <td>Medium</td>\n",
       "      <td>345600.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51100</th>\n",
       "      <td>51098</td>\n",
       "      <td>CA-2014-143259</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-04</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>PO-18865</td>\n",
       "      <td>Patrick O'Donnell</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>New York City</td>\n",
       "      <td>New York</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Binders</td>\n",
       "      <td>Wilson Jones Legal Size Ring Binders</td>\n",
       "      <td>52.776</td>\n",
       "      <td>3</td>\n",
       "      <td>0.2</td>\n",
       "      <td>19.791</td>\n",
       "      <td>7.21</td>\n",
       "      <td>High</td>\n",
       "      <td>345600.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>51094 rows × 25 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       RowID          OrderID  OrderDate   ShipDate        ShipMode  \\\n",
       "0          1    IN-2011-47883 2011-01-01 2011-01-08  Standard Class   \n",
       "1          2    IN-2011-47883 2011-01-01 2011-01-08  Standard Class   \n",
       "2          3    IN-2011-47883 2011-01-01 2011-01-08  Standard Class   \n",
       "3          4  IT-2011-3647632 2011-01-01 2011-01-05    Second Class   \n",
       "4          5     HU-2011-1220 2011-01-01 2011-01-05    Second Class   \n",
       "...      ...              ...        ...        ...             ...   \n",
       "51096  51094    IN-2014-75603 2014-12-31 2015-01-05    Second Class   \n",
       "51097  51095     TU-2014-5170 2014-12-31 2015-01-04    Second Class   \n",
       "51098  51096     MO-2014-2560 2014-12-31 2015-01-05  Standard Class   \n",
       "51099  51097  ES-2014-4785777 2014-12-31 2015-01-04  Standard Class   \n",
       "51100  51098   CA-2014-143259 2014-12-31 2015-01-04  Standard Class   \n",
       "\n",
       "      CustomerID       CustomerName      Segment           City  \\\n",
       "0       JH-15985        Joseph Holt     Consumer    Wagga Wagga   \n",
       "1       JH-15985        Joseph Holt     Consumer    Wagga Wagga   \n",
       "2       JH-15985        Joseph Holt     Consumer    Wagga Wagga   \n",
       "3       EM-14140       Eugene Moren  Home Office      Stockholm   \n",
       "4         AT-735      Annie Thurman     Consumer       Budapest   \n",
       "...          ...                ...          ...            ...   \n",
       "51096   BS-11365       Bill Shonely    Corporate     Vijayawada   \n",
       "51097   VD-11670  Valerie Dominguez     Consumer          Konya   \n",
       "51098    LP-7095          Liz Preis     Consumer         Agadir   \n",
       "51099   DP-13390      Dennis Pardue  Home Office        Hamburg   \n",
       "51100   PO-18865  Patrick O'Donnell     Consumer  New York City   \n",
       "\n",
       "                 State  ...         Category  Sub-Category  \\\n",
       "0      New South Wales  ...  Office Supplies      Supplies   \n",
       "1      New South Wales  ...  Office Supplies         Paper   \n",
       "2      New South Wales  ...        Furniture   Furnishings   \n",
       "3            Stockholm  ...  Office Supplies         Paper   \n",
       "4             Budapest  ...  Office Supplies       Storage   \n",
       "...                ...  ...              ...           ...   \n",
       "51096   Andhra Pradesh  ...  Office Supplies     Fasteners   \n",
       "51097            Konya  ...        Furniture   Furnishings   \n",
       "51098          Souss-M  ...  Office Supplies       Binders   \n",
       "51099          Hamburg  ...  Office Supplies       Binders   \n",
       "51100         New York  ...  Office Supplies       Binders   \n",
       "\n",
       "                                   ProductName    Sales Quantity Discount  \\\n",
       "0                     Acme Trimmer, High Speed  120.366        3      0.1   \n",
       "1      Eaton Computer Printout Paper, 8.5 x 11   55.242        2      0.1   \n",
       "2                   Eldon Light Bulb, Duo Pack  113.670        5      0.1   \n",
       "3                  Enermax Note Cards, Premium   44.865        3      0.5   \n",
       "4                      Tenex Box, Single Width   66.120        4      0.0   \n",
       "...                                        ...      ...      ...      ...   \n",
       "51096         Stockwell Thumb Tacks, Bulk Pack   39.420        3      0.0   \n",
       "51097                   Tenex Frame, Erganomic  173.760        4      0.6   \n",
       "51098  Wilson Jones Hole Reinforcements, Clear    3.990        1      0.0   \n",
       "51099          Wilson Jones Index Tab, Economy   32.250        5      0.0   \n",
       "51100     Wilson Jones Legal Size Ring Binders   52.776        3      0.2   \n",
       "\n",
       "        Profit ShippingCost  OrderPriority  interval  \n",
       "0       36.036         9.72         Medium  604800.0  \n",
       "1       15.342         1.80         Medium  604800.0  \n",
       "2       37.770         4.70         Medium  604800.0  \n",
       "3      -26.055         4.82           High  345600.0  \n",
       "4       29.640         8.17           High  345600.0  \n",
       "...        ...          ...            ...       ...  \n",
       "51096   17.280         2.97         Medium  432000.0  \n",
       "51097 -117.360        13.72         Medium  345600.0  \n",
       "51098    0.420         0.49         Medium  432000.0  \n",
       "51099    8.250         2.21         Medium  345600.0  \n",
       "51100   19.791         7.21           High  345600.0  \n",
       "\n",
       "[51094 rows x 25 columns]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#1).重复值 \n",
    "#unique() 不重复 \n",
    "\n",
    "data.RowID.unique().size\n",
    "data[data.RowID.duplicated()]\n",
    "data.drop(index=data[data.RowID.duplicated()].index,inplace=True)\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>RowID</th>\n",
       "      <th>OrderID</th>\n",
       "      <th>OrderDate</th>\n",
       "      <th>ShipDate</th>\n",
       "      <th>ShipMode</th>\n",
       "      <th>CustomerID</th>\n",
       "      <th>CustomerName</th>\n",
       "      <th>Segment</th>\n",
       "      <th>City</th>\n",
       "      <th>State</th>\n",
       "      <th>...</th>\n",
       "      <th>Category</th>\n",
       "      <th>Sub-Category</th>\n",
       "      <th>ProductName</th>\n",
       "      <th>Sales</th>\n",
       "      <th>Quantity</th>\n",
       "      <th>Discount</th>\n",
       "      <th>Profit</th>\n",
       "      <th>ShippingCost</th>\n",
       "      <th>OrderPriority</th>\n",
       "      <th>interval</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>0 rows × 25 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: [RowID, OrderID, OrderDate, ShipDate, ShipMode, CustomerID, CustomerName, Segment, City, State, Country, PostalCode, Market, Region, ProductID, Category, Sub-Category, ProductName, Sales, Quantity, Discount, Profit, ShippingCost, OrderPriority, interval]\n",
       "Index: []\n",
       "\n",
       "[0 rows x 25 columns]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#2).清洗ShipMode \n",
    "#空值, 数字类型 , 字符串(众数)\n",
    "\n",
    "data[data.ShipMode.isnull()]\n",
    "data.ShipMode.mode() #查看众数 \n",
    "data['ShipMode'].fillna(value=data.ShipMode.mode()[0],inplace=True)\n",
    "data\n",
    "data[data.ShipMode.isnull()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Int64Index: 51094 entries, 0 to 51100\n",
      "Data columns (total 25 columns):\n",
      " #   Column         Non-Null Count  Dtype         \n",
      "---  ------         --------------  -----         \n",
      " 0   RowID          51094 non-null  int64         \n",
      " 1   OrderID        51094 non-null  object        \n",
      " 2   OrderDate      51094 non-null  datetime64[ns]\n",
      " 3   ShipDate       51094 non-null  datetime64[ns]\n",
      " 4   ShipMode       51094 non-null  object        \n",
      " 5   CustomerID     51094 non-null  object        \n",
      " 6   CustomerName   51094 non-null  object        \n",
      " 7   Segment        51094 non-null  object        \n",
      " 8   City           51094 non-null  object        \n",
      " 9   State          51094 non-null  object        \n",
      " 10  Country        51094 non-null  object        \n",
      " 11  PostalCode     9962 non-null   float64       \n",
      " 12  Market         51094 non-null  object        \n",
      " 13  Region         51094 non-null  object        \n",
      " 14  ProductID      51094 non-null  object        \n",
      " 15  Category       51094 non-null  object        \n",
      " 16  Sub-Category   51094 non-null  object        \n",
      " 17  ProductName    51094 non-null  object        \n",
      " 18  Sales          51094 non-null  float64       \n",
      " 19  Quantity       51094 non-null  int64         \n",
      " 20  Discount       51094 non-null  float64       \n",
      " 21  Profit         51094 non-null  float64       \n",
      " 22  ShippingCost   51094 non-null  float64       \n",
      " 23  OrderPriority  51094 non-null  object        \n",
      " 24  interval       51094 non-null  float64       \n",
      "dtypes: datetime64[ns](2), float64(6), int64(2), object(15)\n",
      "memory usage: 10.1+ MB\n"
     ]
    }
   ],
   "source": [
    "#3).折扣数据\n",
    "data[data.Discount>1] #异常数据  \n",
    "data[data.Discount<0]\n",
    "\n",
    "#异常数据, 数字类型 \n",
    "#异常数据--->空值--->弥补 \n",
    "data['Discount'] = data['Discount'].mask(data['Discount']>1, None)\n",
    "data[data.Discount.isnull()]\n",
    "\n",
    "#平均值 \n",
    "#df.Discount.平均  所有的值平均 \n",
    "#meanDiscount 把非空的平均 \n",
    "meanDiscount = round(data[data.Discount.notnull()].Discount.sum()/data[data.Discount.notnull()].Discount.size,2)\n",
    "data['Discount'].fillna(value=meanDiscount, inplace=True)\n",
    "data\n",
    "data.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#4).删除postalCode \n",
    "# 参考函数drop(col=[])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#请输入代码\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>RowID</th>\n",
       "      <th>OrderID</th>\n",
       "      <th>OrderDate</th>\n",
       "      <th>ShipDate</th>\n",
       "      <th>ShipMode</th>\n",
       "      <th>CustomerID</th>\n",
       "      <th>CustomerName</th>\n",
       "      <th>Segment</th>\n",
       "      <th>City</th>\n",
       "      <th>State</th>\n",
       "      <th>...</th>\n",
       "      <th>Category</th>\n",
       "      <th>Sub-Category</th>\n",
       "      <th>ProductName</th>\n",
       "      <th>Sales</th>\n",
       "      <th>Quantity</th>\n",
       "      <th>Discount</th>\n",
       "      <th>Profit</th>\n",
       "      <th>ShippingCost</th>\n",
       "      <th>OrderPriority</th>\n",
       "      <th>interval</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>IN-2011-47883</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-08</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>JH-15985</td>\n",
       "      <td>Joseph Holt</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Wagga Wagga</td>\n",
       "      <td>New South Wales</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Supplies</td>\n",
       "      <td>Acme Trimmer, High Speed</td>\n",
       "      <td>120.366</td>\n",
       "      <td>3</td>\n",
       "      <td>0.1</td>\n",
       "      <td>36.036</td>\n",
       "      <td>9.72</td>\n",
       "      <td>Medium</td>\n",
       "      <td>604800.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>IN-2011-47883</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-08</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>JH-15985</td>\n",
       "      <td>Joseph Holt</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Wagga Wagga</td>\n",
       "      <td>New South Wales</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Paper</td>\n",
       "      <td>Eaton Computer Printout Paper, 8.5 x 11</td>\n",
       "      <td>55.242</td>\n",
       "      <td>2</td>\n",
       "      <td>0.1</td>\n",
       "      <td>15.342</td>\n",
       "      <td>1.80</td>\n",
       "      <td>Medium</td>\n",
       "      <td>604800.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>IN-2011-47883</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-08</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>JH-15985</td>\n",
       "      <td>Joseph Holt</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Wagga Wagga</td>\n",
       "      <td>New South Wales</td>\n",
       "      <td>...</td>\n",
       "      <td>Furniture</td>\n",
       "      <td>Furnishings</td>\n",
       "      <td>Eldon Light Bulb, Duo Pack</td>\n",
       "      <td>113.670</td>\n",
       "      <td>5</td>\n",
       "      <td>0.1</td>\n",
       "      <td>37.770</td>\n",
       "      <td>4.70</td>\n",
       "      <td>Medium</td>\n",
       "      <td>604800.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>IT-2011-3647632</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-05</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>EM-14140</td>\n",
       "      <td>Eugene Moren</td>\n",
       "      <td>Home Office</td>\n",
       "      <td>Stockholm</td>\n",
       "      <td>Stockholm</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Paper</td>\n",
       "      <td>Enermax Note Cards, Premium</td>\n",
       "      <td>44.865</td>\n",
       "      <td>3</td>\n",
       "      <td>0.5</td>\n",
       "      <td>-26.055</td>\n",
       "      <td>4.82</td>\n",
       "      <td>High</td>\n",
       "      <td>345600.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>HU-2011-1220</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-05</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>AT-735</td>\n",
       "      <td>Annie Thurman</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Budapest</td>\n",
       "      <td>Budapest</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Storage</td>\n",
       "      <td>Tenex Box, Single Width</td>\n",
       "      <td>66.120</td>\n",
       "      <td>4</td>\n",
       "      <td>0.0</td>\n",
       "      <td>29.640</td>\n",
       "      <td>8.17</td>\n",
       "      <td>High</td>\n",
       "      <td>345600.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51096</th>\n",
       "      <td>51094</td>\n",
       "      <td>IN-2014-75603</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-05</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>BS-11365</td>\n",
       "      <td>Bill Shonely</td>\n",
       "      <td>Corporate</td>\n",
       "      <td>Vijayawada</td>\n",
       "      <td>Andhra Pradesh</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Fasteners</td>\n",
       "      <td>Stockwell Thumb Tacks, Bulk Pack</td>\n",
       "      <td>39.420</td>\n",
       "      <td>3</td>\n",
       "      <td>0.0</td>\n",
       "      <td>17.280</td>\n",
       "      <td>2.97</td>\n",
       "      <td>Medium</td>\n",
       "      <td>432000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51097</th>\n",
       "      <td>51095</td>\n",
       "      <td>TU-2014-5170</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-04</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>VD-11670</td>\n",
       "      <td>Valerie Dominguez</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Konya</td>\n",
       "      <td>Konya</td>\n",
       "      <td>...</td>\n",
       "      <td>Furniture</td>\n",
       "      <td>Furnishings</td>\n",
       "      <td>Tenex Frame, Erganomic</td>\n",
       "      <td>173.760</td>\n",
       "      <td>4</td>\n",
       "      <td>0.6</td>\n",
       "      <td>-117.360</td>\n",
       "      <td>13.72</td>\n",
       "      <td>Medium</td>\n",
       "      <td>345600.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51098</th>\n",
       "      <td>51096</td>\n",
       "      <td>MO-2014-2560</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-05</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>LP-7095</td>\n",
       "      <td>Liz Preis</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Agadir</td>\n",
       "      <td>Souss-M</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Binders</td>\n",
       "      <td>Wilson Jones Hole Reinforcements, Clear</td>\n",
       "      <td>3.990</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.420</td>\n",
       "      <td>0.49</td>\n",
       "      <td>Medium</td>\n",
       "      <td>432000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51099</th>\n",
       "      <td>51097</td>\n",
       "      <td>ES-2014-4785777</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-04</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>DP-13390</td>\n",
       "      <td>Dennis Pardue</td>\n",
       "      <td>Home Office</td>\n",
       "      <td>Hamburg</td>\n",
       "      <td>Hamburg</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Binders</td>\n",
       "      <td>Wilson Jones Index Tab, Economy</td>\n",
       "      <td>32.250</td>\n",
       "      <td>5</td>\n",
       "      <td>0.0</td>\n",
       "      <td>8.250</td>\n",
       "      <td>2.21</td>\n",
       "      <td>Medium</td>\n",
       "      <td>345600.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51100</th>\n",
       "      <td>51098</td>\n",
       "      <td>CA-2014-143259</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-04</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>PO-18865</td>\n",
       "      <td>Patrick O'Donnell</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>New York City</td>\n",
       "      <td>New York</td>\n",
       "      <td>...</td>\n",
       "      <td>Office Supplies</td>\n",
       "      <td>Binders</td>\n",
       "      <td>Wilson Jones Legal Size Ring Binders</td>\n",
       "      <td>52.776</td>\n",
       "      <td>3</td>\n",
       "      <td>0.2</td>\n",
       "      <td>19.791</td>\n",
       "      <td>7.21</td>\n",
       "      <td>High</td>\n",
       "      <td>345600.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>51094 rows × 24 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       RowID          OrderID  OrderDate   ShipDate        ShipMode  \\\n",
       "0          1    IN-2011-47883 2011-01-01 2011-01-08  Standard Class   \n",
       "1          2    IN-2011-47883 2011-01-01 2011-01-08  Standard Class   \n",
       "2          3    IN-2011-47883 2011-01-01 2011-01-08  Standard Class   \n",
       "3          4  IT-2011-3647632 2011-01-01 2011-01-05    Second Class   \n",
       "4          5     HU-2011-1220 2011-01-01 2011-01-05    Second Class   \n",
       "...      ...              ...        ...        ...             ...   \n",
       "51096  51094    IN-2014-75603 2014-12-31 2015-01-05    Second Class   \n",
       "51097  51095     TU-2014-5170 2014-12-31 2015-01-04    Second Class   \n",
       "51098  51096     MO-2014-2560 2014-12-31 2015-01-05  Standard Class   \n",
       "51099  51097  ES-2014-4785777 2014-12-31 2015-01-04  Standard Class   \n",
       "51100  51098   CA-2014-143259 2014-12-31 2015-01-04  Standard Class   \n",
       "\n",
       "      CustomerID       CustomerName      Segment           City  \\\n",
       "0       JH-15985        Joseph Holt     Consumer    Wagga Wagga   \n",
       "1       JH-15985        Joseph Holt     Consumer    Wagga Wagga   \n",
       "2       JH-15985        Joseph Holt     Consumer    Wagga Wagga   \n",
       "3       EM-14140       Eugene Moren  Home Office      Stockholm   \n",
       "4         AT-735      Annie Thurman     Consumer       Budapest   \n",
       "...          ...                ...          ...            ...   \n",
       "51096   BS-11365       Bill Shonely    Corporate     Vijayawada   \n",
       "51097   VD-11670  Valerie Dominguez     Consumer          Konya   \n",
       "51098    LP-7095          Liz Preis     Consumer         Agadir   \n",
       "51099   DP-13390      Dennis Pardue  Home Office        Hamburg   \n",
       "51100   PO-18865  Patrick O'Donnell     Consumer  New York City   \n",
       "\n",
       "                 State  ...         Category Sub-Category  \\\n",
       "0      New South Wales  ...  Office Supplies     Supplies   \n",
       "1      New South Wales  ...  Office Supplies        Paper   \n",
       "2      New South Wales  ...        Furniture  Furnishings   \n",
       "3            Stockholm  ...  Office Supplies        Paper   \n",
       "4             Budapest  ...  Office Supplies      Storage   \n",
       "...                ...  ...              ...          ...   \n",
       "51096   Andhra Pradesh  ...  Office Supplies    Fasteners   \n",
       "51097            Konya  ...        Furniture  Furnishings   \n",
       "51098          Souss-M  ...  Office Supplies      Binders   \n",
       "51099          Hamburg  ...  Office Supplies      Binders   \n",
       "51100         New York  ...  Office Supplies      Binders   \n",
       "\n",
       "                                   ProductName    Sales Quantity Discount  \\\n",
       "0                     Acme Trimmer, High Speed  120.366        3      0.1   \n",
       "1      Eaton Computer Printout Paper, 8.5 x 11   55.242        2      0.1   \n",
       "2                   Eldon Light Bulb, Duo Pack  113.670        5      0.1   \n",
       "3                  Enermax Note Cards, Premium   44.865        3      0.5   \n",
       "4                      Tenex Box, Single Width   66.120        4      0.0   \n",
       "...                                        ...      ...      ...      ...   \n",
       "51096         Stockwell Thumb Tacks, Bulk Pack   39.420        3      0.0   \n",
       "51097                   Tenex Frame, Erganomic  173.760        4      0.6   \n",
       "51098  Wilson Jones Hole Reinforcements, Clear    3.990        1      0.0   \n",
       "51099          Wilson Jones Index Tab, Economy   32.250        5      0.0   \n",
       "51100     Wilson Jones Legal Size Ring Binders   52.776        3      0.2   \n",
       "\n",
       "        Profit  ShippingCost  OrderPriority  interval  \n",
       "0       36.036          9.72         Medium  604800.0  \n",
       "1       15.342          1.80         Medium  604800.0  \n",
       "2       37.770          4.70         Medium  604800.0  \n",
       "3      -26.055          4.82           High  345600.0  \n",
       "4       29.640          8.17           High  345600.0  \n",
       "...        ...           ...            ...       ...  \n",
       "51096   17.280          2.97         Medium  432000.0  \n",
       "51097 -117.360         13.72         Medium  345600.0  \n",
       "51098    0.420          0.49         Medium  432000.0  \n",
       "51099    8.250          2.21         Medium  345600.0  \n",
       "51100   19.791          7.21           High  345600.0  \n",
       "\n",
       "[51094 rows x 24 columns]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 参考答案\n",
    "\n",
    "data.drop(columns=['PostalCode'],inplace=True)\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.数据整理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>RowID</th>\n",
       "      <th>OrderID</th>\n",
       "      <th>OrderDate</th>\n",
       "      <th>ShipDate</th>\n",
       "      <th>ShipMode</th>\n",
       "      <th>CustomerID</th>\n",
       "      <th>CustomerName</th>\n",
       "      <th>Segment</th>\n",
       "      <th>City</th>\n",
       "      <th>State</th>\n",
       "      <th>...</th>\n",
       "      <th>Sales</th>\n",
       "      <th>Quantity</th>\n",
       "      <th>Discount</th>\n",
       "      <th>Profit</th>\n",
       "      <th>ShippingCost</th>\n",
       "      <th>OrderPriority</th>\n",
       "      <th>interval</th>\n",
       "      <th>Order-year</th>\n",
       "      <th>Order-month</th>\n",
       "      <th>quarter</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>IN-2011-47883</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-08</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>JH-15985</td>\n",
       "      <td>Joseph Holt</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Wagga Wagga</td>\n",
       "      <td>New South Wales</td>\n",
       "      <td>...</td>\n",
       "      <td>120.366</td>\n",
       "      <td>3</td>\n",
       "      <td>0.1</td>\n",
       "      <td>36.036</td>\n",
       "      <td>9.72</td>\n",
       "      <td>Medium</td>\n",
       "      <td>604800.0</td>\n",
       "      <td>2011</td>\n",
       "      <td>1</td>\n",
       "      <td>2011Q1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>IN-2011-47883</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-08</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>JH-15985</td>\n",
       "      <td>Joseph Holt</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Wagga Wagga</td>\n",
       "      <td>New South Wales</td>\n",
       "      <td>...</td>\n",
       "      <td>55.242</td>\n",
       "      <td>2</td>\n",
       "      <td>0.1</td>\n",
       "      <td>15.342</td>\n",
       "      <td>1.80</td>\n",
       "      <td>Medium</td>\n",
       "      <td>604800.0</td>\n",
       "      <td>2011</td>\n",
       "      <td>1</td>\n",
       "      <td>2011Q1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>IN-2011-47883</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-08</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>JH-15985</td>\n",
       "      <td>Joseph Holt</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Wagga Wagga</td>\n",
       "      <td>New South Wales</td>\n",
       "      <td>...</td>\n",
       "      <td>113.670</td>\n",
       "      <td>5</td>\n",
       "      <td>0.1</td>\n",
       "      <td>37.770</td>\n",
       "      <td>4.70</td>\n",
       "      <td>Medium</td>\n",
       "      <td>604800.0</td>\n",
       "      <td>2011</td>\n",
       "      <td>1</td>\n",
       "      <td>2011Q1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>IT-2011-3647632</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-05</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>EM-14140</td>\n",
       "      <td>Eugene Moren</td>\n",
       "      <td>Home Office</td>\n",
       "      <td>Stockholm</td>\n",
       "      <td>Stockholm</td>\n",
       "      <td>...</td>\n",
       "      <td>44.865</td>\n",
       "      <td>3</td>\n",
       "      <td>0.5</td>\n",
       "      <td>-26.055</td>\n",
       "      <td>4.82</td>\n",
       "      <td>High</td>\n",
       "      <td>345600.0</td>\n",
       "      <td>2011</td>\n",
       "      <td>1</td>\n",
       "      <td>2011Q1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>HU-2011-1220</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>2011-01-05</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>AT-735</td>\n",
       "      <td>Annie Thurman</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Budapest</td>\n",
       "      <td>Budapest</td>\n",
       "      <td>...</td>\n",
       "      <td>66.120</td>\n",
       "      <td>4</td>\n",
       "      <td>0.0</td>\n",
       "      <td>29.640</td>\n",
       "      <td>8.17</td>\n",
       "      <td>High</td>\n",
       "      <td>345600.0</td>\n",
       "      <td>2011</td>\n",
       "      <td>1</td>\n",
       "      <td>2011Q1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51096</th>\n",
       "      <td>51094</td>\n",
       "      <td>IN-2014-75603</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-05</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>BS-11365</td>\n",
       "      <td>Bill Shonely</td>\n",
       "      <td>Corporate</td>\n",
       "      <td>Vijayawada</td>\n",
       "      <td>Andhra Pradesh</td>\n",
       "      <td>...</td>\n",
       "      <td>39.420</td>\n",
       "      <td>3</td>\n",
       "      <td>0.0</td>\n",
       "      <td>17.280</td>\n",
       "      <td>2.97</td>\n",
       "      <td>Medium</td>\n",
       "      <td>432000.0</td>\n",
       "      <td>2014</td>\n",
       "      <td>12</td>\n",
       "      <td>2014Q4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51097</th>\n",
       "      <td>51095</td>\n",
       "      <td>TU-2014-5170</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-04</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>VD-11670</td>\n",
       "      <td>Valerie Dominguez</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Konya</td>\n",
       "      <td>Konya</td>\n",
       "      <td>...</td>\n",
       "      <td>173.760</td>\n",
       "      <td>4</td>\n",
       "      <td>0.6</td>\n",
       "      <td>-117.360</td>\n",
       "      <td>13.72</td>\n",
       "      <td>Medium</td>\n",
       "      <td>345600.0</td>\n",
       "      <td>2014</td>\n",
       "      <td>12</td>\n",
       "      <td>2014Q4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51098</th>\n",
       "      <td>51096</td>\n",
       "      <td>MO-2014-2560</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-05</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>LP-7095</td>\n",
       "      <td>Liz Preis</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>Agadir</td>\n",
       "      <td>Souss-M</td>\n",
       "      <td>...</td>\n",
       "      <td>3.990</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.420</td>\n",
       "      <td>0.49</td>\n",
       "      <td>Medium</td>\n",
       "      <td>432000.0</td>\n",
       "      <td>2014</td>\n",
       "      <td>12</td>\n",
       "      <td>2014Q4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51099</th>\n",
       "      <td>51097</td>\n",
       "      <td>ES-2014-4785777</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-04</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>DP-13390</td>\n",
       "      <td>Dennis Pardue</td>\n",
       "      <td>Home Office</td>\n",
       "      <td>Hamburg</td>\n",
       "      <td>Hamburg</td>\n",
       "      <td>...</td>\n",
       "      <td>32.250</td>\n",
       "      <td>5</td>\n",
       "      <td>0.0</td>\n",
       "      <td>8.250</td>\n",
       "      <td>2.21</td>\n",
       "      <td>Medium</td>\n",
       "      <td>345600.0</td>\n",
       "      <td>2014</td>\n",
       "      <td>12</td>\n",
       "      <td>2014Q4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51100</th>\n",
       "      <td>51098</td>\n",
       "      <td>CA-2014-143259</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2015-01-04</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>PO-18865</td>\n",
       "      <td>Patrick O'Donnell</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>New York City</td>\n",
       "      <td>New York</td>\n",
       "      <td>...</td>\n",
       "      <td>52.776</td>\n",
       "      <td>3</td>\n",
       "      <td>0.2</td>\n",
       "      <td>19.791</td>\n",
       "      <td>7.21</td>\n",
       "      <td>High</td>\n",
       "      <td>345600.0</td>\n",
       "      <td>2014</td>\n",
       "      <td>12</td>\n",
       "      <td>2014Q4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>51094 rows × 27 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       RowID          OrderID  OrderDate   ShipDate        ShipMode  \\\n",
       "0          1    IN-2011-47883 2011-01-01 2011-01-08  Standard Class   \n",
       "1          2    IN-2011-47883 2011-01-01 2011-01-08  Standard Class   \n",
       "2          3    IN-2011-47883 2011-01-01 2011-01-08  Standard Class   \n",
       "3          4  IT-2011-3647632 2011-01-01 2011-01-05    Second Class   \n",
       "4          5     HU-2011-1220 2011-01-01 2011-01-05    Second Class   \n",
       "...      ...              ...        ...        ...             ...   \n",
       "51096  51094    IN-2014-75603 2014-12-31 2015-01-05    Second Class   \n",
       "51097  51095     TU-2014-5170 2014-12-31 2015-01-04    Second Class   \n",
       "51098  51096     MO-2014-2560 2014-12-31 2015-01-05  Standard Class   \n",
       "51099  51097  ES-2014-4785777 2014-12-31 2015-01-04  Standard Class   \n",
       "51100  51098   CA-2014-143259 2014-12-31 2015-01-04  Standard Class   \n",
       "\n",
       "      CustomerID       CustomerName      Segment           City  \\\n",
       "0       JH-15985        Joseph Holt     Consumer    Wagga Wagga   \n",
       "1       JH-15985        Joseph Holt     Consumer    Wagga Wagga   \n",
       "2       JH-15985        Joseph Holt     Consumer    Wagga Wagga   \n",
       "3       EM-14140       Eugene Moren  Home Office      Stockholm   \n",
       "4         AT-735      Annie Thurman     Consumer       Budapest   \n",
       "...          ...                ...          ...            ...   \n",
       "51096   BS-11365       Bill Shonely    Corporate     Vijayawada   \n",
       "51097   VD-11670  Valerie Dominguez     Consumer          Konya   \n",
       "51098    LP-7095          Liz Preis     Consumer         Agadir   \n",
       "51099   DP-13390      Dennis Pardue  Home Office        Hamburg   \n",
       "51100   PO-18865  Patrick O'Donnell     Consumer  New York City   \n",
       "\n",
       "                 State  ...    Sales Quantity Discount   Profit ShippingCost  \\\n",
       "0      New South Wales  ...  120.366        3      0.1   36.036         9.72   \n",
       "1      New South Wales  ...   55.242        2      0.1   15.342         1.80   \n",
       "2      New South Wales  ...  113.670        5      0.1   37.770         4.70   \n",
       "3            Stockholm  ...   44.865        3      0.5  -26.055         4.82   \n",
       "4             Budapest  ...   66.120        4      0.0   29.640         8.17   \n",
       "...                ...  ...      ...      ...      ...      ...          ...   \n",
       "51096   Andhra Pradesh  ...   39.420        3      0.0   17.280         2.97   \n",
       "51097            Konya  ...  173.760        4      0.6 -117.360        13.72   \n",
       "51098          Souss-M  ...    3.990        1      0.0    0.420         0.49   \n",
       "51099          Hamburg  ...   32.250        5      0.0    8.250         2.21   \n",
       "51100         New York  ...   52.776        3      0.2   19.791         7.21   \n",
       "\n",
       "      OrderPriority  interval  Order-year  Order-month  quarter  \n",
       "0            Medium  604800.0        2011            1   2011Q1  \n",
       "1            Medium  604800.0        2011            1   2011Q1  \n",
       "2            Medium  604800.0        2011            1   2011Q1  \n",
       "3              High  345600.0        2011            1   2011Q1  \n",
       "4              High  345600.0        2011            1   2011Q1  \n",
       "...             ...       ...         ...          ...      ...  \n",
       "51096        Medium  432000.0        2014           12   2014Q4  \n",
       "51097        Medium  345600.0        2014           12   2014Q4  \n",
       "51098        Medium  432000.0        2014           12   2014Q4  \n",
       "51099        Medium  345600.0        2014           12   2014Q4  \n",
       "51100          High  345600.0        2014           12   2014Q4  \n",
       "\n",
       "[51094 rows x 27 columns]"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['Order-year'] = data['OrderDate'].dt.year \n",
    "data['Order-month'] = data['OrderDate'].dt.month\n",
    "data['quarter'] = data['OrderDate'].dt.to_period('Q')\n",
    "data\n",
    "#清洗: 数据分析使用到相应的数据,评估 (数字类型字段)   数据重要性\n",
    "#整理: 分析维度 , 整理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
